AITopics | gradient small stochastically

How To Make the Gradients Small Stochastically: Even Faster Convex and Nonconvex SGD

Neural Information Processing SystemsNov-20-2025, 22:42:40 GMT

Stochastic gradient descent (SGD) gives an optimal convergence rate when minimizing convex stochastic objectives $f(x)$. However, in terms of making the gradients small, the original SGD does not give an optimal rate, even when $f(x)$ is convex. If $f(x)$ is convex, to find a point with gradient norm $\varepsilon$, we design an algorithm SGD3 with a near-optimal rate $\tilde{O}(\varepsilon^{-2})$, improving the best known rate $O(\varepsilon^{-8/3})$. If $f(x)$ is nonconvex, to find its $\varepsilon$-approximate local minimum, we design an algorithm SGD5 with rate $\tilde{O}(\varepsilon^{-3.5})$,

faster convex and nonconvex sgd, gradient small stochastically, name change, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.61)

Add feedback

Reviews: How To Make the Gradients Small Stochastically: Even Faster Convex and Nonconvex SGD

Neural Information Processing SystemsOct-7-2024, 21:00:02 GMT

This work studies convergence rates of the gradients for convex composite objectives by combining Nesterov's tricks used for gradient descent with SGD. The authors provide three approaches which differ from each other only slightly and they provide the convergence rates for all the proposed approaches. My comments on this work are as follow: 1. It is indeed important to study convergence rates of gradients especially for non-convex problems. The authors motivate the readers by mentioning this but they assume convexity in their problem set-up.

convergence rate, faster convex and nonconvex sgd, gradient small stochastically, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.55)

Add feedback

How To Make the Gradients Small Stochastically: Even Faster Convex and Nonconvex SGD

Allen-Zhu, Zeyuan

Neural Information Processing SystemsFeb-14-2020, 07:29:04 GMT

Stochastic gradient descent (SGD) gives an optimal convergence rate when minimizing convex stochastic objectives $f(x)$. However, in terms of making the gradients small, the original SGD does not give an optimal rate, even when $f(x)$ is convex. If $f(x)$ is convex, to find a point with gradient norm $\varepsilon$, we design an algorithm SGD3 with a near-optimal rate $\tilde{O}(\varepsilon {-2})$, improving the best known rate $O(\varepsilon {-8/3})$. This is no slower than the best known stochastic version of Newton's method in all parameter regimes. Papers published at the Neural Information Processing Systems Conference.

artificial intelligence, machine learning, varepsilon, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.66)

Add feedback

How To Make the Gradients Small Stochastically: Even Faster Convex and Nonconvex SGD

Allen-Zhu, Zeyuan

Neural Information Processing SystemsDec-31-2018

Stochastic gradient descent (SGD) gives an optimal convergence rate when minimizing convex stochastic objectives $f(x)$. However, in terms of making the gradients small, the original SGD does not give an optimal rate, even when $f(x)$ is convex. If $f(x)$ is convex, to find a point with gradient norm $\varepsilon$, we design an algorithm SGD3 with a near-optimal rate $\tilde{O}(\varepsilon^{-2})$, improving the best known rate $O(\varepsilon^{-8/3})$. If $f(x)$ is nonconvex, to find its $\varepsilon$-approximate local minimum, we design an algorithm SGD5 with rate $\tilde{O}(\varepsilon^{-3.5})$, where previously SGD variants only achieve $\tilde{O}(\varepsilon^{-4})$. This is no slower than the best known stochastic version of Newton's method in all parameter regimes.

artificial intelligence, convex, machine learning, (16 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Washington > King County > Redmond (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada > Quebec > Montreal (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.57)

Add feedback

How To Make the Gradients Small Stochastically: Even Faster Convex and Nonconvex SGD

Allen-Zhu, Zeyuan

Neural Information Processing SystemsDec-31-2018

Stochastic gradient descent (SGD) gives an optimal convergence rate when minimizing convex stochastic objectives $f(x)$. However, in terms of making the gradients small, the original SGD does not give an optimal rate, even when $f(x)$ is convex. If $f(x)$ is convex, to find a point with gradient norm $\varepsilon$, we design an algorithm SGD3 with a near-optimal rate $\tilde{O}(\varepsilon^{-2})$, improving the best known rate $O(\varepsilon^{-8/3})$. If $f(x)$ is nonconvex, to find its $\varepsilon$-approximate local minimum, we design an algorithm SGD5 with rate $\tilde{O}(\varepsilon^{-3.5})$, where previously SGD variants only achieve $\tilde{O}(\varepsilon^{-4})$. This is no slower than the best known stochastic version of Newton's method in all parameter regimes.

artificial intelligence, convex, machine learning, (16 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Washington > King County > Redmond (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada > Quebec > Montreal (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.57)

Add feedback

Collaborating Authors

gradient small stochastically

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

How To Make the Gradients Small Stochastically: Even Faster Convex and Nonconvex SGD

Reviews: How To Make the Gradients Small Stochastically: Even Faster Convex and Nonconvex SGD

How To Make the Gradients Small Stochastically: Even Faster Convex and Nonconvex SGD

How To Make the Gradients Small Stochastically: Even Faster Convex and Nonconvex SGD

How To Make the Gradients Small Stochastically: Even Faster Convex and Nonconvex SGD